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Abstract — We consider the problem of recursively reconstructing time 
sequences of sparse signals (with unknown and time-varying sparsity 
patterns) from a limited number of linear incoherent measurements with 
additive noise. The idea of our proposed solution, KF CS-residual (KF- 
CS) is to replace compressed sensing (CS) on the observation by CS 
on the Kalman filtered (KF) observation residual computed using the 
previous estimate of the support. KF-CS error stability over time is 
studied. Simulation comparisons with CS and LS-CS are shown. 

I. Introduction 

Consider the problem of recursively and causally reconstructing 
time sequences of spatially sparse signals (with unknown and time- 
varying sparsity patterns) from a limited number of linear incoherent 
measurements with additive noise. The signals are sparse in some 
transform domain referred to as the sparsity basis. Important applica- 
tions include dynamic MRI reconstruction for real-time applications 
such as MRI-guided surgery, single-pixel video imaging 0, or video 
compression. Due to strong temporal dependencies in the signal 
sequence, it is usually valid to assume that its sparsity pattern 
(support of the sparsity transform vector) changes slowly over time. 
This was verified in |4), 0. 

The solution to the static version of the above problem is provided 
by compressed sensing (CS) (6), (7). CS for noisy observations, 
e.g. Dantzig selector (8), Lasso (9), or Basis Pursuit Denoising 
(BPDN) 1101 , 1111 . have been shown to have small error as long 
as incoherence assumptions hold. Most existing solutions for the 
dynamic problem, e.g. (3), 1121 . are non-causal and batch solutions. 
Batch solutions process the entire time sequence in one go and thus 
have much higher reconstruction complexity. An alternative would 
be to apply CS at each time separately (simple CS), which is online 
and low-complexity, but since it does not use past observations, its 
reconstruction error is much larger when the number of available 
observations is small. Our goal is to develop a recursive solution 
that improves the accuracy of simple CS by using past observations, 
but keeps the reconstruction complexity similar to that of simple CS. 
By "recursive ", we mean a solution that uses only the previous signal 
estimate and the current observation vector at the current time. 

In this work, we propose a solution called KF-CS-residual (KF- 
CS) which is motivated by reformulating the above problem as causal 
minimum mean squared error (MMSE) estimation with a slow time- 
varying set of dominant basis directions (or equivalently the support 
of the sparsity basis coefficients' vector). If the support is known, and 
a linear Gaussian prior dynamic model is assumed for the nonzero 
coefficients, the causal MMSE solution is given by the Kalman filter 
(KF) 1 13 1 for this support. When the support is unknown and time- 
varying, the initial support can be estimated using CS. Whenever 
there is an addition to the support, it can be estimated by running 
CS on the KF residual, followed by thresholding. This new support 
estimate can be used to run the KF at the next time instant. If some 
coefficients become and remain nearly zero, they can be removed 
from the support set. Both the computational and storage complexity 
of KF-CS is similar to that of simple CS - 0(m' ! ) at a given time 
where m is the signal length |141 Table 1] and 0(Nm 3 ) for an 
N length sequence. This is significantly lower than 0(N 3 m 3 ) for 
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batch CS. Note that a full KF, that does not use the knowledge 
that the signal is sparse, is meaningless here, because the number 
of observations available is smaller than the signal dimension, and 
thus many elements of the state (sparsity basis coefficients vector) 
will be unobservable. Unless all unobservable modes are stable, the 
error will blow up 1131 , fTI . 

The most closely related work to KF-CS is our work on LS-CS |2|, 
t4l which uses an LS residual instead of a KF residual. Thus it only 
uses the previous support estimate, not the previous signal estimates, 
to improve the current reconstruction. KF-CS uses both and hence 
it outperforms LS-CS when the available number of measurements 
is small, e.g. see Fig. [2] The work of 1151 gives an approximate 
batch-CS approach for dynamic MRI. Bayesian approaches, but all 
for reconstructing a single sparse signal, include 1161 . 1171 . 1141 . 
Related work, which appeared after fTj, and in parallel with (2l . 
includes 1181 (addresses recursive sparse estimation but with time- 
invariant support), and our own later work on modified-CS 1 191 . 

This paper is organized as follows. The signal model and the 
algorithm are described in Sec. [TT] We analyze the CS-residual step 
of KF-CS in Sec. [Til] In Sec. |W] we prove KF-CS error stability 
and discuss why our result needs stronger assumptions than a similar 
result for LS-CS Q. Simulation results comparing KF-CS with LS- 
CS and simple CS are given in Sec. [V] and conclusions in Sec. |VI| 

In this work, we do "CS", whether in simple CS or in CS- 
residual, using the Dantzig selector (DS) fS^. This choice was initially 
motivated by the fact that its guarantees are stronger (depend only on 
signal support size, not support elements) than those for BPDN 1111 
and its results are simpler to apply and modify. In later work (5), 
we have also used BPDN. Between DS and Lasso (9), either can be 
used and everything will remain the same except for some constants. 

A. Notation and Problem Definition 

The set operations U, D, and \ have the usual meanings. T c denotes 
the complement of T w.r.t. [1, m] := [1, 2, . . . m], i.e. T c := [1, m] \ 
T. \T\ denotes the size (cardinality) of T. 

For a vector, v, and a set, T, vt denotes the \T\ length sub-vector 
containing the elements of v corresponding to the indices in the set T. 

denotes the Ik norm of a vector v. If just ||u|| is used, it refers 
to 1 1 1 1 2 - For a matrix M, ||M|]fc denotes its induced fc-norm, while 
just ||M|| refers to || A/ 1| 2 - M' denotes the transpose of M. For a tall 
matrix, M, M 1 " := (M' M)^ 1 M' . For symmetric matrices, Mi < 
M2 means that M2 — Mi is positive semidefinite. For a fat matrix A, 
At denotes the sub-matrix obtained by extracting the columns of A 
corresponding to the indices in T. The .^-restricted isometry property 
(RIP) constant, 5s, and the 5", S'-restricted orthogonality constant, 
0s,s'> are as defined in equations 1.3 and 1.5 of QO respectively. 

For a square matrix, Q, we use (Q)t!,t 2 to denote the sub-matrix 
of Q containing rows and columns corresponding to the entries in Ti 
and T2 respectively. I denotes an appropriate sized identity matrix. 
The m x m matrix It is defined as 



(It)t,T = I, (fr)TS[l,i7.] = 0, (/T)[l,m],T<= = 



(1) 



We use to denote a vector or matrix of all zeros of appropriate 
size. The notation z ~ /V(/i, S) means that z is Gaussian distributed 
with mean \i and covariance S. 



Let (zt)mxi denote the spatial signal at time t and (yt)nxi, with 
n < m, denote its noise-corrupted observation vector at t, i.e. y t = 
Hzt + wt. The signal, Zt, is sparse in a given sparsity basis (e.g. 
wavelet) with orthonormal basis matrix, $ mxm , i.e. x t = z± is a 
sparse vector. We denote its support by Nt and we use St := |A?t| 
to denote its size. Thus the observation model is 



The above model can be summarized as follows. 

' S a if t = tj 



yt = Ax t +w t , A = H$, E[w t ] = 0, E[w t w' t ] = a 2 1 



(2) 



where E[-] denotes expectation. We assume that A has unit norm 
columns. The observation noise, w t , is independent identically dis- 
tributed (i.i.d.) over t and is independent of xt. Our goal is to 
recursively estimate x t (or equivalently the signal, z t = $a;t) using 
2/i,. . .yt- By recursively, we mean, use only yt and the estimate 
from t — 1, xt-i, to compute the estimate at t. 
Definition 1 (Define S„ S„): For A := 

1) let S* denote the largest S for which 5s < 1/2, 

2) let 5** denote the largest S for which 62s + ds,2S < 1. 
Definition 2 (Define x t , Nt): We use ait to denote the final esti- 
mate of xt at time t and N t to denote its support estimate. 

Definition 3 (Define T, A, A e ): We use T = T t := N t -i to 
denote the support estimate from the previous time. This serves as 
an initial estimate of the current support.We use A = At := Nt \ T t 
to denote the unknown part of the support at the current time. We 
use A e = A e ,t := Tt \ N t to denote the "erroneous" part of T t . To 
keep notation simple, we remove the subscript t in most places. 

II. Kalman Filtered CS residual (KF-CS) 

The LS-CS-residual (LS-CS) algorithm (4) om y use d the previous 
support estimate, T, to obtain the current reconstruction, but did not 
use the previous nonzero coefficient estimates, {xt-i)T. Because of 
temporal dependencies, these also change slowly and using this fact 
should improve reconstruction accuracy further. To do this we can 
replace LS by regularized LS. If training data is available to learn a 
linear prior model for signal coefficients' change, this can be done by 
replacing the initial LS estimate of LS-CS by a Kalman filtered (KF) 
estimate. The KF will give the optimal (in terms of minimizing the 
Bayesian MSE) regularization parameters if the size of the unknown 
support, |A| =0. These will be close-to-optimal if |A| is nonzero 
but small. We assume a simple linear model described below in Sec. 
III-AI We develop the KF-CS algorithm for it in Sec. lII-Bl and discuss 
its pros and cons in Sec. III-CI 

A. Signal Model 

We assume an i.i.d. Gaussian random walk model with support 
additions and removals occurring every d time instants. Additions 
occur at every tj — 1 + jd and removals at every tj+i — 1 for all 
j > 0. The support sets, Nt, at all t, are deterministic unknowns, 
while the sequence of xt's is a random process. 

Signal Model 1: Assume the following model. 

1) At t = 0, xo is 5*o sparse with support iVo and (a:o)iv ~ 

2) At every addition time, tj = 1 + jd, for all j > 0, there are 
S a new additions to the support. Denote the set of indices of 
the coefficients added at tj by A(j). 

3) At every removal time, tj+i — 1 = (J + l)d, for all j > 0, 
there are S r removals from the support. 

4) The maximum support size is S max , i.e. \N t \ < S max at all t. 

5) Every new coefficient that gets added to the support starts from 
zero and follows an independent Gaussian random walk model 
with zero drift and change variance o 2 ys . 

6) The value of every removed coefficient and the corresponding 
change variance both get set to zero. 



|JV«\JVt_i| = 
|JVt_i\JV«| = 



otherwise 

S r if t — tj-\-1 — 1 







otherwise 



x ~ Af(0, Qo), where Q = o- 3yafi I No 
Vt ~ Af(0, Qt), where Q t = o 2 sys I Nt 

(xt)N t = (xt-l)N t + {.Vt)N t 

(x t ) N c = (v t ) N c = (3) 
Assumption 1: We assume that 

1) The support changes slowly over time, i.e. S a •C \Nt\ and 
S r <K \Nt\. This is empirically verified in (4), (5). 

2) The nonzero values also change slowly, i.e. o 2 ys is small. 

B. KF CS-residual (KF-CS) algorithm 

Recall that T := Nt-i denotes the support estimate from t — 1. 
KF CS-residual (KF-CS) runs a KF for the system in J2), (£3j but with 
Qt replaced by Qt = a 2 ys lT and computes the KF residual, denoted 
2/t,res- The new additions, if any, to T, are detected by performing CS 
on 2/t,res and thresholding the output. If the support set changes, an LS 
estimate is computed using the new support estimate. If it does not 
change, we just use the initial KF output as the estimate. We then use 
this estimate to compute deletions from the support by thresholding 
with a different (typically larger) threshold. Once again, if the support 
set changes, a final LS estimate is computed using the new support 
and if not, then we just use the initial KF output. 

In this work, the CS-residual step in KF-CS uses the Dantzig 
selector (8) (but this can be easily changed to BPDN or Lasso or 
any greedy method such as OMP etc), i.e. it solves 



mm HClli s.t. \\A'(y - AC) Hoc < A 



(4) 



with y replaced by the current KF residual, 2/t,res. 

Let Pt\t-i, Pt and K t denote the "assumed" prediction and update 
error covariance matrices and the Kalman gain used by the KF in KF- 
CS. We say "assumed" since the KF does not always use the correct 
value of Qt and so Pt\t-i or Pt are also not equal to the actual error 
covariances. 

We summarize the complete KF-CS algorithm below. 
Initialization (t = 0); At t — 0, we run simple CS (Dantzig selector) 
with a large enough number of measurements, no > n, i.e. we solve 
I0 with y = yo and A = Ao (Ao will be an no x m matrix). 
This is followed by support estimation and then LS estimation as in 
the Gauss-Dantzig selector. We denote the final output by xq and its 
estimated support by No. For t > do, 

1) Initial KF. Let T — Nt-i- Run Kalman prediction and update 
using Q t — o- 2 ys lT and compute the KF residual, 2/t.res, using 

ft|t-i = Pt-l + Qt, where Q t ■= o- 2 ys I T 
Kt = P t \t-iA' {APt^A' + a 2 /)" 1 
Pt = (I - K t A)P t]t -i 
xt.uM = (I - K t A)x t -i + K t yt 
yt,res = yt — Ax±,m (5) 

2) CS-residual. Do CS (Dantzig selector) on the KF residual, i.e. 
solve ([4} with y = 2/i,res- Denote its output by fit. Compute 



it,CSres = Xt,im< + Pt 

3) Detection and LS. Detect additions to T using 

fket = T U {i G T c : |(£ t ,cs res )i| > a} 



(6) 



2 



If Tdet is equal to T, set x t ,iet = Xt,m, 
else, 

compute an LS estimate using Tdet, i.e. compute 

(xt,det)f , = Af 'y t , (xt.fetjfc = ( 7 ) 

del del j c i 

4) Deletion and Final LS. Estimate deletions to Tdet using 

Nt = Tde, \{i£ Tdet : |(&,det)i| < a de i} (8) 

If Nt is equal to T, set it = &t,init, 
else, 

compute an LS estimate using Nt and update P t , i.e. 

(it) Arc = 
(P^Nt.Nt = (Afi t ^JV t ) 1<j2 ' 

5) Output i t and z t = $it. Feedback xt, Pt, Nt. 

Remark 1: Notice that the final LS step re-initializes the KF when- 
ever the estimated support changes. This ensures less dependence of 
the current error on the past, and makes the stability analysis easier. 

Remark 2: For ease of notation, in {5j> we write the KF equations 
for the entire xt. But the algorithm actually runs a reduced order 
KF for only (xt)r at time t, i.e. we actually have (£t)T c = 0, 

{Kt)T<=,[l:n} = 0, (Tt|t-l)[l,m],T c = 0, {Pt-l)[l ,m],T<= = 0, 

(Pt|t-i)T<=,[i,m] = 0, and (Pt_i) T c,[ l m ] = 0. For computational 
speedup, the reduced order KF should be explicitly implemented. 

Remark 3: The KF in KF-CS does not always run with correct 
model parameters. Thus, even when al ys /a 2 is small, it is not clear 
if KF-CS will always outperform LS-CS |4j. This will hold at times 
when the support is accurately estimated and the KF has stabilized 
[see Fig. |2(a)| . Also, this will hold when support changes occur 
slowly enough, and n is small so that LS-CS error becomes instable, 
but is just large enough to prevent KF-CS instability [see Fig. |2(c)| . 

C. Discussion of the Signal Model 

A more accurate model than Signal Model [T] would be random 
walk with nonzero and time-varying drift. If accurate knowledge of 
the time-varying drift is available, the KF estimation error can be 
reduced significantly. But, in practice, to estimate the time-varying 
drift values, one would need a large number of identically distributed 
training signal sequences, which is an impractical assumption in most 
cases. On the other hand, in the above model the parameters are time- 
invariant and their values can be estimated from a single training 
sequence. This is done in (5), 1201 . 

Now, a random walk model at all times is not a realistic signal 
model since it implies that the signal power keeps increasing over 
time. The following is what is more realistic. A new sparse basis 
coefficient starts from zero and slowly increases to a certain roughly 
constant value, i.e. it follows a random walk model for sometime 
and then reaches steady state. Steady state can usually be accurately 
modeled by a (statistically) stationary model with nonzero mean. To 
design KF-CS for such a model one would either need to detect 
when a coefficient becomes stationary or one would need to know it 
ahead of time. The former will typically be very error prone while 
the latter is an impractical assumption. To avoid having to do this, 
we just assume a random walk model at all times. 

In Sec. [V] we show that the KF-CS algorithm of Sec. III-BI works 
both for data generated from Signal Model Q] and for data generated 
from a more realistic bounded signal power model taken from (4), 
which is a deterministic version of what is discussed above. In (5), 
1201 , we show that it works even for actual image sequences. 



III. Analyzing (KF)CS-residualstep 
The KF residual, j/t,res, can be rewritten as yt,™ = Afit+wt where 

(A)A = (Xt — £t,init)A = (xt)A 
{fit)T = (Xt — &t,Ut)T 

= [I - K t A T ]{xt - x t -i)r - KtA A (xt)A - K t w t 

(A)(TUA)» = (10) 

where T = N t -i and K t = (K t ) T ,[i,n]- Thus, fit is |T U A| = 
\N t U A e | sparse. 

In Appendix lAl we show that ||(/3t):r|| is bounded as in dl6l ). As 
we argue there, if (a) the support changes slowly enough, (b) the 
signal values change slowly enough, (c) the noise is small enough 
and (d) the previous reconstruction is accurate enough, this bound 
will be small, i.e. fit will be compressible along T. In other words, 
fit will be only | A | -approximately-sparse. Because of (a) and (d), | A| 
will be small compared to \Nt\. Thus doing CS on yt,re$ will incur 
much less error than doing CS on yt (simple CS), which needs to 
reconstruct a | TV* | -sparse signal, x t . This statement can be quantified 
by using l |16| > to bound CS-residual error exactly like in Theorem 
1] and then doing the comparison with CS also as in |4). 

The CS-residual error bound will be directly proportional to the 
bound on ||(/3t)T| given in <16t . This can be used to argue why 
KF-CS outperforms LS-CS when n is smaller and support changes 
slowly enough. We do this in Appendix [B] 

IV. KF-CS Error Stability 

Analyzing the KF-CS algorithm of Sec. III-BI which includes the 
deletion step, is difficult using the approach that we outline below. 
Thus, in this section, we study KF-CS without the deletion step, i.e. 
we set etdei — 0. KF-CS without deletion assumes that there are few 
and bounded number of removals and false detects. For simplicity, 
in this work, we just assume S r — in Signal Model Q] and we will 
select a so that there are zero false detects. S r = along with the 
assumption that the maximum sparsity size is S'max implies that there 
are only a finite number of addition times, K, i.e. for all t > tx-i, 
N t = Nt K _ 1 . We summarize this in the following signal model. 

Signal Model 2: Assume Signal Model [T] with S r = 0. This 
implies that there are only a finite number of addition times, tj, 
j = 0,l,... (K - 1) and K = \ S, ™*~ S ° ] . Let t K ■= oo. 

Consider the genie-aided KF, i.e. a KF which knows the true 
support Nt at each t. It is the MMSE estimator of x t from yi,. ■ .yt 
if the support sets, Nt, are assumed known and the noise is Gaussian, 
and is the linear MMSE for any arbitrary noise. In this section, we 
find sufficient conditions under which, with high probability (w.h.p.), 
KF-CS for Signal Model [2] and observation model given by l(2j gets 
to within a small error of the genie-KF for the same system, within 
a finite delay of the new addition time. Since the genie-KF error is 
itself stable w.h.p., as long as <5s max < 1, this also means that the 
KF-CS reconstruction error is stable w.h.p. 

Our approach involves two steps. Consider t G \tj, First, 
we find the conditions under which w.h.p. all elements of the current 
support, Nt = Ntj get detected before the next addition time, ij+i. 
Denote the detection delay by Tdet. If this happens, then during [tj + 
Tdet, tj+i), both KF-CS and genie-KF run the same fixed dimensional 
and fixed parameter KF, but with different initial conditions. Next, 
we show that if this interval is large enough, then, w.h.p, KF-CS will 
stabilize to within a small error of the genie-KF within a finite delay 
after tj +Tdet. Combining these two results gives our stability result. 

We are able to do the second step because, whenever Nt 7^ Nt-i, 
the final LS step re-initializes the KF with P t , x t given by l[9}. This 
ensures that the KF-CS estimate, x t , and the Kalman gain, Kt, at 
t + 1 and future times depend on the past observations only through 
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T := N t . Thus, conditioned on the event {N t = N t , V f 6 [tj + 
Tda,tj+i)}, there will be no dependence of either x t or of K t on 
observations before tj + Tdet. 

A. The Stability Result 

We begin by stating Lemma [JJ which shows two things. First, if 
accurate initialization is assumed, the noise is bounded, S max < 
otdei = and a is high enough, there are no false detections. If the 
delay between addition times also satisfies d > Tdet(e, S a ), where 
Tdet is what we call the "high probability detection delay", then the 
following holds. If before tj, the support was perfectly estimated, 
then w.p. > 1 — e, all the additions which occurred at tj will get 
detected by tj + Td e t(e, S a ) < tj+l- 

Lemma 1: Assume that x t follows Signal Model [2] If 

1) (initialization (t = 0)) all elements of xo get correctly detected 
and there are no false detects, i.e. No = No, 

2) (measurements) Smax < S** and ||w||oo < A/||j4||i, 

3) (algorithm) we set add = and a 2 = B, := 
Ci(S'max)S'maxA 2 , where Ci(5) is defined in (§] Theorem 1.1], 

4) (signal model) delay between addition times, d > Tdet(e, S a ), 



where Tdet(e, S) 



4B» 



[*•] denotes the greatest integer function and 

J z ° c (l/v / 27r)e~ :E ^dic is the Gaussian Q-function, 



1, (ID 



then 



1) at each t, N t C JV t C JV t +i and so |A e ,t+i| 

2) at each t, \\x t — it,csres|| 2 < B, 







3) Pr(£ j |F J ) > 1 - e where Fj := {N t = JV t for t = tj - 1} 
and := {iV* = JV t , V * 6 [tj + r dc ,(e, S),tj+i - 1]}. 

The proof is given in Appendix [Cl The initialization assumption is 
made only for simplicity. It can be easily satisfied by using no > n to 
be large enough. Next we give Lemma[2]which states that if the true 
support set does not change after a certain time, t nc , and if it gets 
correctly detected by a certain time, t* > t nc , then KF-CS converges 
to the genie-KF in mean-square and hence also in probability. 

Lemma 2: Assume that x t follows Signal Model [2] <5s max < 1; 
and a.dei = 0. Define the event D := {N t = N t = N*, V t > t»}. 
Conditioned on D, the difference between the KF-CS estimate, Xt 
and the genie-aided KF estimate, xt,GAKF, difft := x t — Xt,GAKF, 
converges to zero in mean square and hence also in probability. ■ 

The proof is similar to what we think should be a standard result 
for a KF with wrong initial conditions (here, KF-CS with t = t* as 
the initial time) to converge to a KF with correct initial conditions 
(here, genie-KF) in mean square. A similar (actually stronger) result 
is proved for the continuous time KF in 1221 . We could not find 
an appropriate citation for the discrete time KF and hence we just 
give our proof in Appendix [E] After review, this can be significantly 
shortened. The proof involves two parts. First, we use the results from 
fl3l and (2Tj to show that (a) P t \t-i, Pt, K t and J t := I- K t A N , 
converge to steady state values which are the same as those for the 
corresponding genie-KF; and (b) the steady state value of J t , denoted 
J*, has spectral radius less than 1 and because of this, there exists a 
matrix norm, denoted ||.|| p , s.t. ||J*|| P < 1. Second, we use (a) and 
(b) to show that the difference in the KF-CS and genie-KF estimates, 
diff t , converges to zero in mean square, and hence also in probability 
(by Markov's inequality). 

A direct corollary of the above lemma is the following. 

Corollary 1: Assume that x t follows Signal Model[2] 5s milx < 1; 
and a de i = 0. Define the event D f := {N t = N t = N„, V t £ 
[t», t**]}. For a given e, e C rr, there exists a tkf(£, fierr, N„) s.t. for 



all t € [t* + tkf, t„], Pr(||diff t || 2 < e m \ Df) > 1 - e. Clearly 
if t»* < t* + tkf, this is an empty interval. 

The stability result then follows by applying Lemma [2] followed 
by Corollary Q] for each addition time, tj. 

Theorem 1 (KF-CS Stability): Assume that xt follows Signal 
Model [2] Let difft := x t — x t ,GAKF where Xt,aAKF is the genie- 
aided KF estimate and x t is the KF-CS estimate. For a given e, e elT , 
if the conditions of Lemma [T] hold, and if the delay between addition 
times, d > Td e t(e, S a ) + tkf(£, £etr, N tj ), where T de t(., .) is defined 
in i ll It in Lemma [T] and tkf(-, ■, •) in Corollary [T] then 

1) Pr(||diff t || 2 < eerr) > (l-e) J+2 , for all t £ [tj +r d et(e, S a ) + 
T KF (e, em, N tj ), t j+ i - 1], for all j = 0, ... (K - 1), 

2) Pr(\A\ < S a and |A e | =0, V t) > (1 - e) K 

3) Pr(\A\ = and |A e | = 0, V t € [tj + r d et(e, S a ),t j+1 - 
1], V j = 0,...A'-l) > (l-e) K . 

The proof is given in Appendix [D] A direct corollary is that after 
tK-i KF-CS will converge to the genie-KF in probability. This is 
because for t > tx-i, Nt remains constant (tx = oo). 

B. Discussion 

Consider a t G [tj,tj+i). Notice that tkf depends on the current 
support, Nt 
additions at t 



Nt while Tdet depends only on the number of 
j, S a - Theorem [Tj says that if n is large enough so 



that S n 



< 5, 



OLdel 



(ensures no deletions); a 



IB, 



(ensures no false detects); and if the time needed for the current 
KF to stabilize, tkf(z, £eir, Nt s ), plus the high probability detection 
delay, Td e t(e, S a ), is smaller than d, then w.p. > (1 — e) J+2 , KF-CS 
will stabilize to within a small error, e e n-, of the genie-KF before the 
next addition time, tj+i. If the current tkf is too large, this cannot 
be claimed. But as long as r det (e, S a ) < d, the unknown support size, 
|A| remains bounded by S a , w.p. > (1 — e) . 

We give our result for the case of zero removals and zero false 
detects, but the same idea will extend even if |A e | is just bounded. 

As explained in Sec. III-CI most signals do not follow a random 
walk model forever (such a model would imply unbounded signal 
power). In practice, a new coefficient may start with following a 
random walk model, but eventually reach steady state (stationary 
model). In this case, it should be possible to modify our result to 
claim that if, before reaching steady state, all coefficients become 
large enough to exceed the threshold plus upper bound on error, and 
if this happens before the next addition time, KF-CS remains stable. 

Our result is weaker than that of LS-CS |4) - it needs S max < S„ 
(the LS-CS result only needs S a < S„ and S max < 5*); it uses 
a random walk model; it does not handle support removals; and the 
computed high-probability detection delay is quite loosfl This is due 
to two main reasons. One is that we assume a zero drift random walk 
model as the signal model both for defining KF-CS and for analyzing 
it, while LS-CS uses a model with nonzero drift for the analysis (the 
algorithm does not assume any signal model). The reason for our 
choosing this model is explained in Sec. III-CI The second and more 
important reason is that bounding KF error is more difficult than 
bounding LS error. This is because the KF error, and hence also the 
(KF)CS-residual error, depends on the previous reconstruction error. 
The (LS)CS-residual error only depends on \T\, |A| and if we can 
get a time-invariant bound on these, we can do the same for the error. 

V. Simulation Results 
We discuss two sets of simulation results. The first simulates data 
according to Signal Model|2]and verifies KF-CS stability. The second 
set of simulations compares KF-CS with LS-CS |4| and simple CS 
(Dantzig selector) [8 |. This comparison uses the more realistic signal 

'Our result may even go through if CS-residual was replaced by CS. 
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(a) n = 59, NMSE 



(b) n = 59, misses & extras 



(c) n = 45, NMSE 



Fig. 2. Comparing KF-CS with CS and LS-CS. CS-residual in LS-CS or in KF-CS used A = 0.17. Misses 



(d) n = 45, misses & extras 

Extras = E[\N t \ N t \ 





(a) Signal Model E] with So 

Sa — 2, d = 5, Omax = 26 



(b) Signal Model E] with S 

S n = 4, d = 10, S max = 20 



Fig. 1. Verifying KF-CS stability for Signal Model [2] 

model assumed in [4], which has a roughly constant signal power and 
support size and allows regular additions and removals from support. 



A. Signal Model\2} verify KF-CS Stability 
We simulated Signal Model [2] with m 



256, So = 8, S a 



t = 1, 6, 11, ... , 46. The measurement model used n = no = 72 
and Gaussian noise with a = 0.16. The normalized MSE (NMSE) 
is plotted in Fig. |l(a)| In a second simulation, we increased S a , but 
we also increased d: we used S a = 4, d = 10 and S max = 20 and 
everything else was the same. We show the error plot in Fig. |l(b)| 
Notice that in both cases, (i) KF-CS stabilizes to within a small error 
of the genie-KF within a short delay of a new addition time; and (ii) 
after the final set of new additions, KF-CS converges to the genie- 
KF. The difference between the two is that the peak errors at the new 
addition time are larger in the second case (since S a is larger). 

We implemented the KF-CS algorithm of Sec. III-BI but without 
the deletion step, i.e. we set ot. af ,i — 0. Since the observation noise 
is not truncated, occasionally the addition step can result in a very 
large number of false additions. To prevent this, we restricted the 
maximum number of allowed additions at a given time to 771/ log 2 m 
(7 between 0.7 and 1.25) largest magnitude coefficients. 

B. Bounded signal power model from $4\l 

For this comparison we used the signal model of (4). This is a 
realistic signal model with roughly constant signal power and support 
size. We used m = 200, So = 20, S a = 2 = S r , a* = 0.2, M = 1, 
d — 8 and r = 3. Thus new additions occurred at t = 2, 10, 18. 
Coefficient decrease began at t = 7, 15 and these got removed at 
t = 9, 17 respectively. The measurement noise was uniform(—c, c). 

In the first simulation, we used 710 = 150, n — 59 and c = 0.1266. 
LS-CS used A = 0.176, a = c/2 = 0.06 = a dei . Also, it restricted 
maximum number of additions at a time to S a + 1. The KF-CS 
algorithm of Sec. lII-Bl was implemented. It used the above parameters 
and it set a 2 = c 2 and a 2 ys = 0.01. For the signal model of (4), there 
are no correct choices of KF parameters. The average of (x t — x t -i) 2 
over i and t was (0.04* (5/8) * (2/20) + 0.11 * (3/8) * (2/20) + 0* 
l*(16/20)) w 0.01 and this motivated the choice of a sya . The noise 
variance is c 2 /3, but we use a larger value to also model the effect of 
extra observation error due to the unknown support A. The NMSE 
plot is shown in Fig. |2(a)| The mean number of misses (E[| JVt \ Nt\]) 



E[\N t \Nt[ 

and of extras (E[|JV t \ N t \]) are plotted in Fig. [2(b)] We averaged 
over 100 Monte Carlo runs. Notice that right after a new addition, 
both LS-CS and KF-CS have similar MSE, but in the stable state 
KF-CS stabilizes to a smaller value. The NMSEs for CS (Dantzig 
selector) and Gauss-Dantzig selector even with different choices of 
A are much larger (40-60%). 

In a second simulation, we used no = 150, n — 45 and c = 0.15 
and everything else was the same as above. The error plots are shown 
in Fig. |2(c)| and the number of extras and misses are plotted in Fig. 
|2(d)| With such a small n, LS-CS error becomes instable. But n — 45 
(along with large enough delay between addition times, d = 8 and 
small enough r = cr 2 ys /a 2 — 0.44) is large enough to prevent KF- 
CS instability. 

VI. Conclusions and Future Work 
We proposed KF CS-residual (KF-CS) which replaces CS on the 
raw observation by CS on the KF residual, computed using the known 
part of the support. We proved KF-CS stability, but the assumptions 
used were somewhat strong (stronger than those used for LS-CS [4]). 
We demonstrated via simulations that KF-CS error is stable and small 
under much weaker assumptions. Also, it significantly outperformed 
LS-CS when the available number of measurements was very small. 

A key direction of future work is to prove KF-CS stability under 
weaker assumptions. This will require assuming a signal model with 
nonzero drift (to get a tighter detection delay bound) and bounded 
signal power. It may also help to assume a statistical prior on support 
change, e.g. by using a model similar to 11171 . A useful extension of 
KF-CS would be to replace CS-residual by modified-CS 1191 . 

Appendix 

A. Bounding || )t I] 

Recall that T t = JV t -i and A t = N t \ N t -i- Let S t = 5|T t | and 
0t = 0|T t |,|A t |- Also, let K t = (Kt) T ,ii,n}, 

M t = A T ' A T + (Pt|t-i)^W 2 and 



s /cr 



(12) 



We use the following simple facts in the discussion below (21 1. For 
symmetric positive definite matrices, M, M, \\M\\ = X m ax(M) = 
l/A min (Af- 1 ), A min (M + M) > A min (M) + A min (A/) while the 
inequality holds in the opposite direction for A ma x- Here A ma x(Af), 
A m in(M) denote the maximum, minimum eigenvalue of M. 
As is well known 1131 , Kt, [I — KtAr], Pt can be rewritten as 

K t = M^At' 
Jt~I- K t A T = Mf ^Ptit-OJ.Vff 8 

(Pt\t-i)r,T = (Pt-i)T,r + (o- 2 vs It)t,t, where 
M~_\a 2 if T t = T t -! 
(At/At, )-V if T t # T t _i 

The third equation is repeated from ([5). To bound ||(/3t)T||, defined 
in d 1 Ot . we need to bound ||Jt|| and H-RTtAr'Ailll, which in turn 
requires bounding ||M t _1 ||, ||(-Pt|t-i)rTll ar, d ||^4t'j4a||- Using the 



{Pt-l)T,T 



(13) 
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definition of 6s, s' (H e 1 1-5], it is easy to see that ||At'j4a|| < Ot- 
Using (ED, < (A mi „(AC\)^ + ^V)" 1 if T t = 

T t -i and IKPtit-O^Vll < ((1 + S^a 2 + olys)- 1 otherwise. 
Also, ||M t - || = A max (A/ t _1 ). Thus bounding them requires upper 

N Using the 



bounding A max (M t 1 ) and lower bounding A n 
definition of the RIP constant (8) eq. 1.3], 



\m: 



= A max (M t 1 ) = 



^{At'At + {Pt\t-l)T,T° 2 ) 

1 



l-5 t + 



i-s t +- 



i)t,t) 

ifT t = r t _! 

if Tt ± Tt-i 



(14) 



Similarly, we can lower bound A m i n (M t _ 1 ) and use it to get 
" ^ ifT t = T t _i 



\(Pt 



t\t-l)T,T 



k << 



i+«t-i+ 



II A/, 



-1 + 



Ci+«t-i 
1 

1 



Tt-2 

if T t = T t -i + Tt-2 

if T t / r t _i 

(15) 



From GO}, {CU 

+ t ||(x t ) A || + 



II (ft )t II 



< II M t ~ 



t|t- 



-i)t,tI 



(X t - Xt-l)T 



Tl 



Using this and the above bounds, we get 

(A)t|| < a t [Tl + 6> t || Ot) A || + ||A T 't«t||] , where 

\\(x t -l -Xt-l)TnNt\\ + ||(£t-l)Aj| + y/\T n N t \\\v t \ 



and at is defined in d 14b and bt in dl5t . Notice that at is an increasing 
function of St and r, and also of H-M^H < a t -i if Tt = Tt-i, 

Now, A C (JVt-i \ T) U (JV t \ Nt-i) and A e C (T \ iV t -i) U 
(Nt-i \ Nt). If the previous reconstruction is accurate enough, 
the previous support estimate will also be accurate enough. This 
combined with the slow support change assumption will imply that 
|A| and |A e | are small enough. |A e | small enough will imply that 
\T\ is small enough (since |T| < \N t \ + A e ]) and hence St is small 
enough. St small ensures smaller a t and larger bt. | A e | and | A| small 
enough will also imply that 8t is small enough. The noise being small 
along with |A e | small will imply that || .A-r'tOtH is small. 

Slow signal value change implies (i) r is small enough and (ii) at 
alH, 1 1 vt 1 1 oo is small enough w.h.p.. Small r implies that at is small, 
but it also implies that bt is small. Small |kt||oo at all t, along with 
small noise, also results in the previous reconstruction being accurate 
enough which, in turn means || (xt-i — aSt-l)Tniv t || is small. Using 
this and the fact that only small coefficients get falsely deleted or 
removed^, || (xt-i)A e | is also small. All this ensures that Tl in l !16t 
is not very large even when b t is small. This combined with the 
discussion of the previous paras ensures that the bound on || (/3t )r | 
is small. Thus, if (a), (b), (c), (d) given in Sec. [m] hold, ||(/3 t ) T || is 
small, i.e. fit is only A -approximately-sparse; and |A| is small. 

B. Comparing KF-CS and LS-CS using the bound on \\{fit)T\\ 
We will mention that we are only comparing upper bounds here. 
Consider at defined in dl4t . Suppose r = 0.5 and n is such that 
S t = 0.8 for all t. LS-CS can be interpreted as KF-CS with r = 
oo. Thus for LS-CS a t = 1/(1 - 8 t ) = 5 always. For KF-CS, 

2 The fact that only small coefficients get removed from Nt is not modeled 
in Signal Model[T] but is true in practice. But it is modeled in our simulations. 



even if, at t, T t / T t -t, a t = 1/(0.2 + (1/5.5)) = 2.62 (almost 
half). If T t does not change for one time instant, at+i reduces to 
1/(0.2 + l/(a t + 0.5)) = 1.92. If it does not change for two time 
instants, then at+2 reduces to 1.63. If T t does change and the change 
is a correct addition, the set A t becomes smaller and so the second 
term of l !16t . #t||(a;t)A||, reduces. In either case, the bound reduces. 

Of course for LS-CS, bt — oo and so the first term of l |16t , Tl = 
while for KF-CS, Tl / 0. But if o% ya and a 2 are small and 
support changes slowly, Tl will also be small (argued earlier). When 
n is small, the net effect is that the KF-CS bound on ||(/3t)T||, and 
hence the bound on CS-residual error, is small compared to that for 
LS-CS. This is the main reason that, when n is very small, KF-CS 
error remains stable, while nothing can be said about LS-CS error. 
In simulations, we notice that it often becomes unstable. 

C. Proof of Lemma [7] 

With || 1U || oo < A/||A||i, all results of ||8| hold w.p. 1 (because 
eq 3.1 of (8j holds w.p. 1). From Theorem 1.1. of (8), if a signal is 
5-sparse, and if S < then, the error after running the Dantzig 
selector is bounded by B*. 

The first two claims follow by induction. Consider the base case, 
t = 0. The first claim holds because condition [T] of the lemma holds 
and because S r — in Signal Model [2] Since |JVn| < S m ax and 
condition [2] holds, (8] Theorem 1.1] applies. Thus the second claim 
holds at t = 0. For the induction step, assume that the first two claims 
hold for t — 1. Using the first claim for t — 1, |A e ,t| = 0. Thus, fit 
is \Nt U A e ,f| = |7Vt| sparse. Since |JVt| < S'max and condition [2] 
holds, we can apply (§] Theorem 1.1] to get \\fi t — fit\\ 2 < B t . But 
Xt — it,csrcs = fit — fit and so the second claim follows for t. By 
setting a = V ' B* (condition[3j, we ensure that for any i with x% = 0, 
(£csres)f = {xi - (ics res )i) 2 < \\ x ~ £cs res || 2 < B t = q 2 (no false 
(16) detects). Using this and S r = 0, the first claim follows for t. 

For the third claim, it is easy to see that for any i E A, if, at 
t, (x t )j > 2a 2 + 2B t = 4B*, then i will definitely get detected 
at t. Consider a i 6 [tj 



vj,tj+i — 1], Since Fj holds, so at t = tj, 
- 0, there cannot be false deletions and 
|A| < S a - Consider the worst case: no 



S a . 



A = A(j). Also, since otdei = 
thus for any t G [tj , tj+i — 1], 

coefficient has got detected until t, i.e. At = A(j) and so | At 
All i e A(j) will definitely get detected at t if (x t ) 2 > 4B* for all 
i G A(j). From our model, the different coefficients are independent, 

and for any i G A(j), (x t ) 2 ~ A/"(0, (t-tj + 1)<Tsj, s ). Thus, 



Pr((x t )i > 4B„ Vi G A(j) | Fj)= 2Q 



4B* 



(t - tj + l)a'i y 



Using the first claim, Pr(N t = Nt | Fj) is equal to this. Thus for 
t = tj + r<fet(e, S a ), Pr(N t = N t \ Fj) > 1 - e. Since there are no 
false detects; no deletions and no new additions until tj+i, Nt = N t 
for t = tj + Tdd implies that Ej occurs. This proves the third claim. 

D. Proof of Theorem fj] 

The events Ej and Fj are defined in LemmaQ] At the first addition 
time, to = 1, using the initialization condition, iVt -i = Nt -i, i.e. 
Fo holds. Thus, by LemmaQ] Pr(Eo) > 1— e. Consider tj for j > 0. 
ClcMyE Pr(Ej\E ,E u . = Pr{Ej\Ej-{) = Pr(Ej\Fj). 

By LemmaQ] Pr(Ej\Fj) > 1 - e. Combining this with Pr(E ) > 
1 - e, we get Pr(Ej) > (1 - e) 3+1 for all j > 0. 

Assume that Ej occurs and apply Corollary Q] with t* = tj + 
Tdet(e, S a ) and t„» = tj+i — 1. Combining the conclusion of Corollary 
Q]with Pr{Ej) > (1 - e) j+1 , the first claim follows. 

The second and third claims follow directly from arguments in the 
proof of LemmaQ] and Pr(E flBifl... E K -i) > (1 - e) K . 

3 since Ej = {(x tj + Tlla )? > 4B«, Vi G A tj+T<kt } and the sequence of 
xt's is a Markov process 
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E. Proof of Lemma\2\ and Corollary |7] 

Let Xt,GAKF denote the genie-aided KF (GA-KF) estimate at t. 

Assume that the event D occurs. Then, for t > t,, N t — N t — TV*, 
i.e. At := Nt\Nt-i = TV* \TV* — (f) (empty set) and so xt = xt,Mt- 
Let e t = x t — x t and e t = x t — x t>G AKF- 

For simplicity of notation we assume in this proof that all variables 
and parameters are only along TV,, i.e. we let xt = (&t)N,, et = 
(e t )iv», v t = (v t )N,, P t \t-i = (Pt\t-i)N,,N,, K t = (Kt) N ,,[v.nj- 
Let J t _ = / - -fLjAiv,- Similarly for x t ,GAKF, et, Pt\t-i, K t , Jt- 
Here P t \t-i, Kt, Jt are the corresponding matrices for GA-KF. 

Let E[-] denote expectation w.r.t. all random quantities conditioned 
on the event D, and let E[-|yi, . . . yt] denote conditional expectation 
given yi,. ..yt and the event D. 

From ((5). for t > t,, e t , it and difft — e t — e t satisfy 

e t — Jte t -i + Jtvt — K t Wt 
et = Jte"t-i + Jtvt — K t Wt 
diff t = J t difft_i + (Jt - JtWt-i + vt) + (K t - Kt)w t (17) 

For t > t, both KF-CS and GA-KF run the same fixed dimensional 
and fixed parameter KF for (a;t)jv, with parameters F = I, Q = 
(°~ S yslN, ) N t ,n, , C = An, , R = <t I, but with different initial con- 
ditions. KF-CS uses x t ,, Pt, + i\t, 7^ E[e t „-f lej^jjyi • ■ - Vt,] while 
GA-KF uses the correct initial conditions, xt,,GAKF, Pt* + \\t* — 
E[e t »+ie' t , +1 |2/i, . . .y t ,] = E[e t » + iei, +1 ]. Since |TV*| < S max and 
<^s max < 1, C = An, is full rank. Thus (I,C) is observable. Also, 
since Q is full rank, (I, Q 1//2 ) is controllable. Thus, starting from any 
initial condition, Pt+i\t will converge to a positive semi-definite, P,, 
which is the unique solution of the discrete algebraic Riccati equation 
with parameters F,Q,C,R 1131 Theorem 8.7.1]. Consequently Kt 
and J t will also converge to K, = P, An,' (An, -P* An,' + a 2 /) -1 
and J, = I — K,An, respectively. For t > t,, the GA-KF also runs 
the same KF. Thus, P t \t~i, K t , Jt will also converge to P,, K,, J, 
respectively 1131 Theorem 8.7.1]. Next, we use this fact to show that 
the estimation errors also converge in mean square. 

Using (131 Theorem E.5.1], J* is stable, i.e. its spectral radius p = 
p(J,) < 1. Let e = (1- p)/2. By (2TJ Lemma 5.6.10], there exists 
a matrix norm, denoted || .|| p , s.t. || J* || P < P + to = (1 + p)/2 < 1. 

Consider any e < (1 — p)/4. The above results imply that there 
exists at c >t, s.t. for all t > t e , \\K t — K t \\ < e, \\J t — Jt\\ < e 
and \\J t \\ P < \\J*\\ p + e< (1 + p)/2 + (1 - p)/4 = (3 + p)/4< 1. 
Now, the last set of undetected elements of 7Y* are detected at t*. 
Thus at t,, KF-CS computes a final LS estimate, i.e. x t , = ^Ijv.^t,, 
i't.lt.-i = P, = (A' n ,A n ,)- 1 <j 2 , K t , = (A'^An,)- 1 ^ 
and J t , = 0. None of these depend on yi . . . y t , and hence the future 
values of x t or of Pt, Jt, Kt etc also do not. Hence t t also does not. 

Since Pt\t-i — > P*, Pt\t-i is bounded. Since p < Pt\t-i, Pt is 
also bounded, i.e. there exists a B < oo s.t. tr(Pt) < B, Vt. Since 
E[e t e' t \yi . . . yt,] = Pt = E[e t e't], thus E[||e 2 ||] = tr(P) < B. 

Thus, using l |17t , the following holds for all t>t e , 



,2,1/2 



+ ||L M J| sup E[|| Mr || 2 ] 

t,<T<t 



2-, 1/2 



where 



E[||diff t || 2 ] 1/2 < 
||M Me || E[||diff te 

U T = ( J T — J T )(e~T-l + Vt) + (K r ~ K T )w T , 

t t 

Mt, u = J k , Lt, u =I + Jt + JtJt-i + - Jk (18) 



k=t e + l 



k=t t + l 



Since neither t e , nor the matrices Jt or Kt for t > t,, depend on 

yi, . . .yt,, we do not need to condition the expectation on j/i, yt,. 

Notice that 



2) ||Mt, t Jp < nt=t e +i II j -IIp < a ^ tc with a - (3 + p)/4 < 
1. Thus ||A/ tjt J| < c p ^a t ~ t ^ where c Pi 2 is the smallest real 
number satisfying ||A/|| < c p .2||M|| p , for all size | TV* | square 
matrices M (holds because of equivalence of norms). 

3) ||it,ij| p < l + a + .-.a^' 5 < -p^.Thus ||L t ,iJ| < 

Combining the above facts, for all t > t c , E[||difft|| 
c p , 2 a t - t «E[l|di fF te l| a ] 1 / 2 + Ce where a := (3 + p)/4," C := 
+ sJ\N,\a'i ys + Vna 2 ). Notice that a < 1. Consider an 
e < 2C(1 — p)/4 and set e = e/2C. It is easy to see that for all 



(!-<•)• 

,2,1/2 < 



log(E[||diff t . 



I1C 



| 2 ] 1/2 )+log(2c p , 2 )- 



-,E[||diff t || 2 ] 1/2 < I. 



t > t i/2C + log(l/a) 

Thus, conditioned on D, diff t converges to zero in mean square. 

By Markov's inequality, this also implies convergence in probabil- 
ity, i.e. for a given e, e elr , there exists a tkf(^, e C ir, TV*) > s.t. for 
all t>t* +tkf(£, e<jrr, TV*), Pr(||difft|| 2 < e m \ D) > (1-e). The 
proof of Corollary Q] follows directly from this. 
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